Overview

Dataset statistics

Number of variables16
Number of observations4240
Missing cells715
Missing cells (%)1.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory530.1 KiB
Average record size in memory128.0 B

Variable types

Categorical8
Numeric8

Warnings

education has 110 (2.6%) missing values Missing
BP Meds has 60 (1.4%) missing values Missing
tot cholesterol has 60 (1.4%) missing values Missing
glucose has 391 (9.2%) missing values Missing
cigsPerDay has 2145 (50.6%) zeros Zeros

Reproduction

Analysis started2021-04-28 06:08:33.411073
Analysis finished2021-04-28 06:09:00.186095
Duration26.78 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

Gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing8
Missing (%)0.2%
Memory size33.2 KiB
Female
2414 
Male
1818 

Length

Max length6
Median length6
Mean length5.140831758
Min length4

Characters and Unicode

Total characters21756
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowFemale
3rd rowMale
4th rowFemale
5th rowFemale
ValueCountFrequency (%)
Female2414
56.9%
Male1818
42.9%
(Missing)8
 
0.2%
2021-04-28T11:39:00.670416image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-28T11:39:00.857905image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
female2414
57.0%
male1818
43.0%

Most occurring characters

ValueCountFrequency (%)
e6646
30.5%
a4232
19.5%
l4232
19.5%
F2414
 
11.1%
m2414
 
11.1%
M1818
 
8.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter17524
80.5%
Uppercase Letter4232
 
19.5%

Most frequent character per category

ValueCountFrequency (%)
e6646
37.9%
a4232
24.1%
l4232
24.1%
m2414
 
13.8%
ValueCountFrequency (%)
F2414
57.0%
M1818
43.0%

Most occurring scripts

ValueCountFrequency (%)
Latin21756
100.0%

Most frequent character per script

ValueCountFrequency (%)
e6646
30.5%
a4232
19.5%
l4232
19.5%
F2414
 
11.1%
m2414
 
11.1%
M1818
 
8.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII21756
100.0%

Most frequent character per block

ValueCountFrequency (%)
e6646
30.5%
a4232
19.5%
l4232
19.5%
F2414
 
11.1%
m2414
 
11.1%
M1818
 
8.4%

age
Real number (ℝ≥0)

Distinct39
Distinct (%)0.9%
Missing2
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean49.57928268
Minimum32
Maximum70
Zeros0
Zeros (%)0.0%
Memory size33.2 KiB
2021-04-28T11:39:01.107894image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum32
5-th percentile37
Q142
median49
Q356
95-th percentile64
Maximum70
Range38
Interquartile range (IQR)14

Descriptive statistics

Standard deviation8.572874944
Coefficient of variation (CV)0.1729124441
Kurtosis-0.989382703
Mean49.57928268
Median Absolute Deviation (MAD)7
Skewness0.2289804971
Sum210117
Variance73.49418481
MonotocityNot monotonic
2021-04-28T11:39:01.382144image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
40192
 
4.5%
46182
 
4.3%
42180
 
4.2%
41174
 
4.1%
48173
 
4.1%
39170
 
4.0%
44166
 
3.9%
45162
 
3.8%
43158
 
3.7%
52149
 
3.5%
Other values (29)2532
59.7%
ValueCountFrequency (%)
321
 
< 0.1%
335
 
0.1%
3418
 
0.4%
3542
1.0%
3684
2.0%
ValueCountFrequency (%)
702
 
< 0.1%
697
 
0.2%
6818
 
0.4%
6745
1.1%
6638
0.9%

education
Categorical

MISSING

Distinct4
Distinct (%)0.1%
Missing110
Missing (%)2.6%
Memory size33.2 KiB
1.0
1717 
2.0
1252 
3.0
688 
4.0
473 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12390
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4.0
2nd row2.0
3rd row1.0
4th row3.0
5th row3.0
ValueCountFrequency (%)
1.01717
40.5%
2.01252
29.5%
3.0688
16.2%
4.0473
 
11.2%
(Missing)110
 
2.6%
2021-04-28T11:39:01.913357image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-28T11:39:02.100841image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1.01717
41.6%
2.01252
30.3%
3.0688
16.7%
4.0473
 
11.5%

Most occurring characters

ValueCountFrequency (%)
.4130
33.3%
04130
33.3%
11717
13.9%
21252
 
10.1%
3688
 
5.6%
4473
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8260
66.7%
Other Punctuation4130
33.3%

Most frequent character per category

ValueCountFrequency (%)
04130
50.0%
11717
20.8%
21252
 
15.2%
3688
 
8.3%
4473
 
5.7%
ValueCountFrequency (%)
.4130
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12390
100.0%

Most frequent character per script

ValueCountFrequency (%)
.4130
33.3%
04130
33.3%
11717
13.9%
21252
 
10.1%
3688
 
5.6%
4473
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII12390
100.0%

Most frequent character per block

ValueCountFrequency (%)
.4130
33.3%
04130
33.3%
11717
13.9%
21252
 
10.1%
3688
 
5.6%
4473
 
3.8%

currentSmoker
Categorical

Distinct2
Distinct (%)< 0.1%
Missing3
Missing (%)0.1%
Memory size33.2 KiB
0.0
2143 
1.0
2094 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12711
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row1.0
4th row1.0
5th row1.0
ValueCountFrequency (%)
0.02143
50.5%
1.02094
49.4%
(Missing)3
 
0.1%
2021-04-28T11:39:02.553959image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-28T11:39:02.741446image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.02143
50.6%
1.02094
49.4%

Most occurring characters

ValueCountFrequency (%)
06380
50.2%
.4237
33.3%
12094
 
16.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8474
66.7%
Other Punctuation4237
33.3%

Most frequent character per category

ValueCountFrequency (%)
06380
75.3%
12094
 
24.7%
ValueCountFrequency (%)
.4237
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12711
100.0%

Most frequent character per script

ValueCountFrequency (%)
06380
50.2%
.4237
33.3%
12094
 
16.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII12711
100.0%

Most frequent character per block

ValueCountFrequency (%)
06380
50.2%
.4237
33.3%
12094
 
16.5%

cigsPerDay
Real number (ℝ≥0)

ZEROS

Distinct33
Distinct (%)0.8%
Missing31
Missing (%)0.7%
Infinite0
Infinite (%)0.0%
Mean9.001900689
Minimum0
Maximum70
Zeros2145
Zeros (%)50.6%
Memory size33.2 KiB
2021-04-28T11:39:02.960158image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q320
95-th percentile30
Maximum70
Range70
Interquartile range (IQR)20

Descriptive statistics

Standard deviation11.92074175
Coefficient of variation (CV)1.324247196
Kurtosis1.023052615
Mean9.001900689
Median Absolute Deviation (MAD)0
Skewness1.247912038
Sum37889
Variance142.1040838
MonotocityNot monotonic
2021-04-28T11:39:03.225765image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
02145
50.6%
20734
 
17.3%
30217
 
5.1%
15210
 
5.0%
10143
 
3.4%
9130
 
3.1%
5120
 
2.8%
3100
 
2.4%
4080
 
1.9%
167
 
1.6%
Other values (23)263
 
6.2%
ValueCountFrequency (%)
02145
50.6%
167
 
1.6%
218
 
0.4%
3100
 
2.4%
49
 
0.2%
ValueCountFrequency (%)
701
 
< 0.1%
6011
 
0.3%
506
 
0.1%
453
 
0.1%
4356
1.3%

BP Meds
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing60
Missing (%)1.4%
Memory size33.2 KiB
0.0
4056 
1.0
 
124

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12540
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.04056
95.7%
1.0124
 
2.9%
(Missing)60
 
1.4%
2021-04-28T11:39:03.788230image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-28T11:39:04.022585image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.04056
97.0%
1.0124
 
3.0%

Most occurring characters

ValueCountFrequency (%)
08236
65.7%
.4180
33.3%
1124
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8360
66.7%
Other Punctuation4180
33.3%

Most frequent character per category

ValueCountFrequency (%)
08236
98.5%
1124
 
1.5%
ValueCountFrequency (%)
.4180
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12540
100.0%

Most frequent character per script

ValueCountFrequency (%)
08236
65.7%
.4180
33.3%
1124
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII12540
100.0%

Most frequent character per block

ValueCountFrequency (%)
08236
65.7%
.4180
33.3%
1124
 
1.0%

prevalentStroke
Categorical

Distinct2
Distinct (%)< 0.1%
Missing9
Missing (%)0.2%
Memory size33.2 KiB
0.0
4206 
1.0
 
25

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12693
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.04206
99.2%
1.025
 
0.6%
(Missing)9
 
0.2%
2021-04-28T11:39:04.475682image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-28T11:39:04.663167image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.04206
99.4%
1.025
 
0.6%

Most occurring characters

ValueCountFrequency (%)
08437
66.5%
.4231
33.3%
125
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8462
66.7%
Other Punctuation4231
33.3%

Most frequent character per category

ValueCountFrequency (%)
08437
99.7%
125
 
0.3%
ValueCountFrequency (%)
.4231
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12693
100.0%

Most frequent character per script

ValueCountFrequency (%)
08437
66.5%
.4231
33.3%
125
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII12693
100.0%

Most frequent character per block

ValueCountFrequency (%)
08437
66.5%
.4231
33.3%
125
 
0.2%

prevalentHyp
Categorical

Distinct2
Distinct (%)< 0.1%
Missing2
Missing (%)< 0.1%
Memory size33.2 KiB
0.0
2922 
1.0
1316 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12714
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row1.0
5th row0.0
ValueCountFrequency (%)
0.02922
68.9%
1.01316
31.0%
(Missing)2
 
< 0.1%
2021-04-28T11:39:05.163135image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-28T11:39:05.350621image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.02922
68.9%
1.01316
31.1%

Most occurring characters

ValueCountFrequency (%)
07160
56.3%
.4238
33.3%
11316
 
10.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8476
66.7%
Other Punctuation4238
33.3%

Most frequent character per category

ValueCountFrequency (%)
07160
84.5%
11316
 
15.5%
ValueCountFrequency (%)
.4238
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12714
100.0%

Most frequent character per script

ValueCountFrequency (%)
07160
56.3%
.4238
33.3%
11316
 
10.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII12714
100.0%

Most frequent character per block

ValueCountFrequency (%)
07160
56.3%
.4238
33.3%
11316
 
10.4%

diabetes
Categorical

Distinct2
Distinct (%)< 0.1%
Missing2
Missing (%)< 0.1%
Memory size33.2 KiB
0.0
4129 
1.0
 
109

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12714
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.04129
97.4%
1.0109
 
2.6%
(Missing)2
 
< 0.1%
2021-04-28T11:39:06.053699image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-28T11:39:06.241205image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.04129
97.4%
1.0109
 
2.6%

Most occurring characters

ValueCountFrequency (%)
08367
65.8%
.4238
33.3%
1109
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8476
66.7%
Other Punctuation4238
33.3%

Most frequent character per category

ValueCountFrequency (%)
08367
98.7%
1109
 
1.3%
ValueCountFrequency (%)
.4238
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12714
100.0%

Most frequent character per script

ValueCountFrequency (%)
08367
65.8%
.4238
33.3%
1109
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII12714
100.0%

Most frequent character per block

ValueCountFrequency (%)
08367
65.8%
.4238
33.3%
1109
 
0.9%

tot cholesterol
Real number (ℝ≥0)

MISSING

Distinct248
Distinct (%)5.9%
Missing60
Missing (%)1.4%
Infinite0
Infinite (%)0.0%
Mean236.6772727
Minimum107
Maximum696
Zeros0
Zeros (%)0.0%
Memory size33.2 KiB
2021-04-28T11:39:06.475542image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum107
5-th percentile170
Q1206
median234
Q3263
95-th percentile312
Maximum696
Range589
Interquartile range (IQR)57

Descriptive statistics

Standard deviation44.61609832
Coefficient of variation (CV)0.1885102773
Kurtosis4.131474784
Mean236.6772727
Median Absolute Deviation (MAD)29
Skewness0.8736339696
Sum989311
Variance1990.596229
MonotocityNot monotonic
2021-04-28T11:39:06.772395image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24085
 
2.0%
22070
 
1.7%
26062
 
1.5%
21061
 
1.4%
23259
 
1.4%
25056
 
1.3%
20056
 
1.3%
23054
 
1.3%
22554
 
1.3%
20553
 
1.2%
Other values (238)3570
84.2%
(Missing)60
 
1.4%
ValueCountFrequency (%)
1071
< 0.1%
1131
< 0.1%
1191
< 0.1%
1241
< 0.1%
1261
< 0.1%
ValueCountFrequency (%)
6961
< 0.1%
6001
< 0.1%
4641
< 0.1%
4531
< 0.1%
4391
< 0.1%

Systolic BP
Real number (ℝ≥0)

Distinct234
Distinct (%)5.5%
Missing4
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean132.3623702
Minimum83.5
Maximum295
Zeros0
Zeros (%)0.0%
Memory size33.2 KiB
2021-04-28T11:39:07.069250image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum83.5
5-th percentile104
Q1117
median128
Q3144
95-th percentile175
Maximum295
Range211.5
Interquartile range (IQR)27

Descriptive statistics

Standard deviation22.03924407
Coefficient of variation (CV)0.1665068708
Kurtosis2.15409061
Mean132.3623702
Median Absolute Deviation (MAD)13
Skewness1.144615741
Sum560687
Variance485.7282791
MonotocityNot monotonic
2021-04-28T11:39:07.353995image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
120107
 
2.5%
130102
 
2.4%
11096
 
2.3%
11588
 
2.1%
12588
 
2.1%
12483
 
2.0%
12280
 
1.9%
12873
 
1.7%
12673
 
1.7%
12372
 
1.7%
Other values (224)3374
79.6%
ValueCountFrequency (%)
83.52
< 0.1%
851
< 0.1%
85.51
< 0.1%
902
< 0.1%
921
< 0.1%
ValueCountFrequency (%)
2951
< 0.1%
2481
< 0.1%
2441
< 0.1%
2431
< 0.1%
2351
< 0.1%

Diastolic BP
Real number (ℝ≥0)

Distinct146
Distinct (%)3.4%
Missing5
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean82.90188902
Minimum48
Maximum142.5
Zeros0
Zeros (%)0.0%
Memory size33.2 KiB
2021-04-28T11:39:07.635229image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum48
5-th percentile66
Q175
median82
Q390
95-th percentile104.65
Maximum142.5
Range94.5
Interquartile range (IQR)15

Descriptive statistics

Standard deviation11.9144673
Coefficient of variation (CV)0.1437176818
Kurtosis1.273033246
Mean82.90188902
Median Absolute Deviation (MAD)7.5
Skewness0.7126932766
Sum351089.5
Variance141.9545311
MonotocityNot monotonic
2021-04-28T11:39:07.947705image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
80262
 
6.2%
82152
 
3.6%
85137
 
3.2%
70135
 
3.2%
81129
 
3.0%
84122
 
2.9%
90118
 
2.8%
78116
 
2.7%
87113
 
2.7%
75108
 
2.5%
Other values (136)2843
67.1%
ValueCountFrequency (%)
481
< 0.1%
501
< 0.1%
511
< 0.1%
522
< 0.1%
531
< 0.1%
ValueCountFrequency (%)
142.51
< 0.1%
1401
< 0.1%
1362
< 0.1%
1352
< 0.1%
1332
< 0.1%

BMI
Real number (ℝ≥0)

Distinct1363
Distinct (%)32.3%
Missing24
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean25.79891603
Minimum15.54
Maximum56.8
Zeros0
Zeros (%)0.0%
Memory size33.2 KiB
2021-04-28T11:39:08.235491image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum15.54
5-th percentile20.06
Q123.07
median25.395
Q328.04
95-th percentile32.7725
Maximum56.8
Range41.26
Interquartile range (IQR)4.97

Descriptive statistics

Standard deviation4.075256108
Coefficient of variation (CV)0.1579622998
Kurtosis2.666443184
Mean25.79891603
Median Absolute Deviation (MAD)2.485
Skewness0.982521942
Sum108768.23
Variance16.60771235
MonotocityNot monotonic
2021-04-28T11:39:08.516741image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22.1918
 
0.4%
22.5418
 
0.4%
23.4818
 
0.4%
22.9118
 
0.4%
25.0916
 
0.4%
23.0916
 
0.4%
23.113
 
0.3%
25.2313
 
0.3%
22.7313
 
0.3%
23.6812
 
0.3%
Other values (1353)4061
95.8%
(Missing)24
 
0.6%
ValueCountFrequency (%)
15.541
< 0.1%
15.961
< 0.1%
16.481
< 0.1%
16.592
< 0.1%
16.611
< 0.1%
ValueCountFrequency (%)
56.81
< 0.1%
51.281
< 0.1%
45.81
< 0.1%
45.791
< 0.1%
44.711
< 0.1%

heartRate
Real number (ℝ≥0)

Distinct72
Distinct (%)1.7%
Missing4
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean75.86779981
Minimum44
Maximum143
Zeros0
Zeros (%)0.0%
Memory size33.2 KiB
2021-04-28T11:39:08.797973image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile60
Q168
median75
Q383
95-th percentile98
Maximum143
Range99
Interquartile range (IQR)15

Descriptive statistics

Standard deviation11.99948806
Coefficient of variation (CV)0.1581631218
Kurtosis0.8483320778
Mean75.86779981
Median Absolute Deviation (MAD)7
Skewness0.630294538
Sum321376
Variance143.9877138
MonotocityNot monotonic
2021-04-28T11:39:09.110427image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
75563
 
13.3%
80384
 
9.1%
70305
 
7.2%
60231
 
5.4%
85228
 
5.4%
72222
 
5.2%
65196
 
4.6%
90172
 
4.1%
68151
 
3.6%
10098
 
2.3%
Other values (62)1686
39.8%
ValueCountFrequency (%)
441
 
< 0.1%
452
 
< 0.1%
461
 
< 0.1%
471
 
< 0.1%
485
0.1%
ValueCountFrequency (%)
1431
 
< 0.1%
1401
 
< 0.1%
1253
0.1%
1222
 
< 0.1%
1207
0.2%

glucose
Real number (ℝ≥0)

MISSING

Distinct143
Distinct (%)3.7%
Missing391
Missing (%)9.2%
Infinite0
Infinite (%)0.0%
Mean81.95193557
Minimum40
Maximum394
Zeros0
Zeros (%)0.0%
Memory size33.2 KiB
2021-04-28T11:39:09.389247image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile62
Q171
median78
Q387
95-th percentile108
Maximum394
Range354
Interquartile range (IQR)16

Descriptive statistics

Standard deviation23.95842785
Coefficient of variation (CV)0.2923473093
Kurtosis58.7214474
Mean81.95193557
Median Absolute Deviation (MAD)8
Skewness6.217638805
Sum315433
Variance574.0062651
MonotocityNot monotonic
2021-04-28T11:39:09.701720image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
75193
 
4.6%
77167
 
3.9%
73156
 
3.7%
80153
 
3.6%
70152
 
3.6%
83151
 
3.6%
78148
 
3.5%
74141
 
3.3%
76127
 
3.0%
85126
 
3.0%
Other values (133)2335
55.1%
(Missing)391
 
9.2%
ValueCountFrequency (%)
402
< 0.1%
431
 
< 0.1%
442
< 0.1%
454
0.1%
473
0.1%
ValueCountFrequency (%)
3942
< 0.1%
3861
< 0.1%
3701
< 0.1%
3681
< 0.1%
3481
< 0.1%

Heart-Att
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
3596 
1
644 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4240
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0
ValueCountFrequency (%)
03596
84.8%
1644
 
15.2%
2021-04-28T11:39:10.232921image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-28T11:39:10.420402image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
03596
84.8%
1644
 
15.2%

Most occurring characters

ValueCountFrequency (%)
03596
84.8%
1644
 
15.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4240
100.0%

Most frequent character per category

ValueCountFrequency (%)
03596
84.8%
1644
 
15.2%

Most occurring scripts

ValueCountFrequency (%)
Common4240
100.0%

Most frequent character per script

ValueCountFrequency (%)
03596
84.8%
1644
 
15.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII4240
100.0%

Most frequent character per block

ValueCountFrequency (%)
03596
84.8%
1644
 
15.2%

Interactions

2021-04-28T11:38:43.270574image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:43.637814image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:43.890655image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:44.155649image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:44.419740image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:44.676985image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:44.929566image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:45.193207image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:45.460940image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:45.737962image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:45.981122image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:46.243842image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:46.501409image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:46.941607image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:47.240088image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:47.499982image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:47.769310image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:48.020587image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:48.271535image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:48.521511image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:48.782736image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:49.031817image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:49.292280image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:49.538928image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:49.784245image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:50.026177image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:50.274362image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:50.525548image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:50.784883image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:51.036427image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:51.284735image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:51.539883image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:51.783767image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:52.019980image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:52.272710image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:52.530558image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:52.782596image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:53.032733image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:53.283712image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:53.529890image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:53.779823image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:54.016549image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:54.261163image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:54.512209image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:54.772614image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:55.019906image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:55.273917image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:55.521336image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:55.768497image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:56.019923image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:56.279771image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:56.550597image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:56.813925image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:57.303128image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:57.557617image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-28T11:38:57.803506image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-04-28T11:39:10.654762image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-04-28T11:39:11.092254image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-04-28T11:39:11.451586image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-04-28T11:39:12.014046image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-04-28T11:39:12.342151image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-04-28T11:38:58.262354image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-04-28T11:38:58.788030image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-04-28T11:38:59.616760image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-04-28T11:39:00.045480image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

GenderageeducationcurrentSmokercigsPerDayBP MedsprevalentStrokeprevalentHypdiabetestot cholesterolSystolic BPDiastolic BPBMIheartRateglucoseHeart-Att
0Male39.04.00.00.00.00.00.00.0195.0106.070.026.9780.077.00
1Female46.02.00.00.00.00.00.00.0250.0121.081.028.7395.076.00
2Male48.01.01.020.00.00.00.00.0245.0127.580.025.3475.070.00
3Female61.03.01.030.00.00.01.00.0225.0150.095.028.5865.0103.01
4Female46.03.01.023.00.00.00.00.0285.0130.084.023.1085.085.00
5Female43.02.00.00.00.00.01.00.0228.0180.0110.030.3077.099.00
6Female63.01.00.00.00.00.00.00.0205.0138.071.033.1160.085.01
7Female45.02.01.020.00.00.00.00.0313.0100.071.021.6879.078.00
8Male52.01.00.00.00.00.01.00.0260.0141.589.026.3676.079.00
9Male43.01.01.030.00.00.01.00.0225.0162.0107.023.6193.088.00

Last rows

GenderageeducationcurrentSmokercigsPerDayBP MedsprevalentStrokeprevalentHypdiabetestot cholesterolSystolic BPDiastolic BPBMIheartRateglucoseHeart-Att
4230Female56.01.01.03.00.00.01.00.0268.0170.0102.022.8957.0NaN0
4231Male58.03.00.00.00.00.01.00.0187.0141.081.024.9680.081.00
4232Male68.01.00.00.00.00.01.00.0176.0168.097.023.1460.079.01
4233Male50.01.01.01.00.00.01.00.0313.0179.092.025.9766.086.01
4234Male51.03.01.043.00.00.00.00.0207.0126.580.019.7165.068.00
4235Female48.02.01.020.0NaN0.00.00.0248.0131.072.022.0084.086.00
4236Female44.01.01.015.00.00.00.00.0210.0126.587.019.1686.0NaN0
4237Female52.02.00.00.00.00.00.00.0269.0133.583.021.4780.0107.00
4238Male40.03.00.00.00.00.01.00.0185.0141.098.025.6067.072.00
4239Female39.03.01.030.00.00.00.00.0196.0133.086.020.9185.080.00